65 research outputs found

    Atomic hydration potentials using a Monte Carlo Reference State (MCRS) for protein solvation modeling

    Get PDF
    BACKGROUND: Accurate description of protein interaction with aqueous solvent is crucial for modeling of protein folding, protein-protein interaction, and drug design. Efforts to build a working description of solvation, both by continuous models and by molecular dynamics, yield controversial results. Specifically constructed knowledge-based potentials appear to be promising for accounting for the solvation at the molecular level, yet have not been used for this purpose. RESULTS: We developed original knowledge-based potentials to study protein hydration at the level of atom contacts. The potentials were obtained using a new Monte Carlo reference state (MCRS), which simulates the expected probability density of atom-atom contacts via exhaustive sampling of structure space with random probes. Using the MCRS allowed us to calculate the expected atom contact densities with high resolution over a broad distance range including very short distances. Knowledge-based potentials for hydration of protein atoms of different types were obtained based on frequencies of their contacts at different distances with protein-bound water molecules, in a non-redundant training data base of 1776 proteins with known 3D structures. Protein hydration sites were predicted in a test set of 12 proteins with experimentally determined water locations. The MCRS greatly improves prediction of water locations over existing methods. In addition, the contribution of the energy of macromolecular solvation into total folding free energy was estimated, and tested in fold recognition experiments. The correct folds were preferred over all the misfolded decoys for the majority of proteins from the improved Rosetta decoy set based on the structure hydration energy alone. CONCLUSION: MCRS atomic hydration potentials provide a detailed distance-dependent description of hydropathies of individual protein atoms. This allows placement of water molecules on the surface of proteins and in protein interfaces with much higher precision. The potentials provide a means to estimate the total solvation energy for a protein structure, in many cases achieving a successful fold recognition. Possible applications of atomic hydration potentials to structure verification, protein folding and stability, and protein-protein interactions are discussed

    A model of evolution with constant selective pressure for regulatory DNA sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Molecular evolution is usually described assuming a neutral or weakly non-neutral substitution model. Recently, new data have become available on evolution of sequence regions under a selective pressure, e.g. transcription factor binding sites. To reconstruct the evolutionary history of such sequences, one needs evolutionary models that take into account a substantial constant selective pressure.</p> <p>Results</p> <p>We present a simple evolutionary model with a single preferred (consensus) nucleotide and the neutral substitution model adopted for all other nucleotides. This evolutionary model has a rate matrix in which all substitutions that do not involve the consensus nucleotide occur with the same rate. The model has two time scales for achieving a stationary distribution; in the general case only one of the two rate parameters can be evaluated from the stationary distribution. In the middle-time zone, a counterintuitive behavior was observed for some parameter values, with a probability of conservation for a non-consensus nucleotide greater than that for the consensus nucleotide. Such an effect can be observed only in the case of weak preference for the consensus nucleotide, when the probability to observe the consensus nucleotide in the stationary distribution is less than 1/2. If the substitution rate is represented as a product of mutation and fixation, only the fixation can be calculated from the stationary distribution. The exhibited conservation of non-consensus nucleotides does not take place if the elements of mutation matrix are identical, and can be related to the reduced mutation rate between the non-consensus nucleotides. This bias can have no effect on the stationary distribution of nucleotide frequencies calculated over the ensemble of multiple alignments, e.g. transcription factor binding sites upstream of different sets of co-regulated orthologous genes.</p> <p>Conclusion</p> <p>The derived model can be used as a null model when analyzing the evolution of orthologous transcription factor binding sites. In particular, our findings show that a nucleotide preferred at some position of a multiple alignment of binding sites for some transcription factor in the same genome is not necessarily the most conserved nucleotide in an alignment of orthologous sites from different species. However, this effect can take place only in the case of a mutation matrix whose elements are not identical.</p

    The complete genome sequence of Pantoea ananatis AJ13355, an organism with great biotechnological potential

    Get PDF
    Pantoea ananatis AJ13355 is a newly identified member of the Enterobacteriaceae family with promising biotechnological applications. This bacterium is able to grow at an acidic pH and is resistant to saturating concentrations of L-glutamic acid, making this organism a suitable host for the production of L-glutamate. In the current study, the complete genomic sequence of P. ananatis AJ13355 was determined. The genome was found to consist of a single circular chromosome consisting of 4,555,536 bp [DDBJ: AP012032] and a circular plasmid, pEA320, of 321,744 bp [DDBJ: AP012033]. After automated annotation, 4,071 protein-coding sequences were identified in the P. ananatis AJ13355 genome. For 4,025 of these genes, functions were assigned based on homologies to known proteins. A high level of nucleotide sequence identity (99%) was revealed between the genome of P. ananatis AJ13355 and the previously published genome of P. ananatis LMG 20103. Short colinear regions, which are identical to DNA sequences in the Escherichia coli MG1655 chromosome, were found to be widely dispersed along the P. ananatis AJ13355 genome. Conjugal gene transfer from E. coli to P. ananatis, mediated by homologous recombination between short identical sequences, was also experimentally demonstrated. The determination of the genome sequence has paved the way for the directed metabolic engineering of P. ananatis to produce biotechnologically relevant compounds

    CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation

    Get PDF
    Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory ‘grammar’, or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila

    Defensin-like peptides in wheat analyzed by whole-transcriptome sequencing: a focus on structural diversity and role in induced resistance

    Get PDF
    Antimicrobial peptides (AMPs) are the main components of the plant innate immune system. Defensins represent the most important AMP family involved in defense and non-defense functions. In this work, global RNA sequencing and de novo transcriptome assembly were performed to explore the diversity of defensin-like (DEFL) genes in the wheat Triticum kiharae and to study their role in induced resistance (IR) mediated by the elicitor metabolites of a non-pathogenic strain FS-94 of Fusarium sambucinum. Using a combination of two pipelines for DEFL mining in transcriptome data sets, as many as 143 DEFL genes were identified in T. kiharae, the vast majority of them represent novel genes. According to the number of cysteine residues and the cysteine motif, wheat DEFLs were classified into ten groups. Classical defensins with a characteristic 8-Cys motif assigned to group 1 DEFLs represent the most abundant group comprising 52 family members. DEFLs with a characteristic 4-Cys motif CX{3,5}CX{8,17}CX{4,6}C named group 4 DEFLs previously found only in legumes were discovered in wheat. Within DEFL groups, subgroups of similar sequences originated by duplication events were isolated. Variation among DEFLs within subgroups is due to amino acid substitutions and insertions/deletions of amino acid sequences. To identify IR-related DEFL genes, transcriptional changes in DEFL gene expression during elicitor-mediated IR were monitored. Transcriptional diversity of DEFL genes in wheat seedlings in response to the fungus Fusarium oxysporum, FS-94 elicitors, and the combination of both (elicitors + fungus) was demonstrated, with specific sets of up- and down-regulated DEFL genes. DEFL expression profiling allowed us to gain insight into the mode of action of the elicitors from F. sambucinum. We discovered that the elicitors up-regulated a set of 24 DEFL genes. After challenge inoculation with F. oxysporum, another set of 22 DEFLs showed enhanced expression in IR-displaying seedlings. These DEFLs, in concert with other defense molecules, are suggested to determine enhanced resistance of elicitor-pretreated wheat seedlings. In addition to providing a better understanding of the mode of action of the elicitors from FS-94 in controlling diseases, up-regulated IR-specific DEFL genes represent novel candidates for genetic transformation of plants and development of pathogen-resistant crops

    Employing toxin-antitoxin genome markers for identification of Bifidobacterium and Lactobacillus strains in human metagenomes

    Get PDF
    Recent research has indicated that in addition to the unique genotype each individual may also have a unique microbiota composition. Difference in microbiota composition may emerge from both its species and strain constituents. It is important to know the precise composition especially for the gut microbiota (GM), since it can contribute to the health assessment, personalized treatment, and disease prevention for individuals and groups (cohorts). The existing methods for species and strain composition in microbiota are not always precise and usually not so easy to use. Probiotic bacteria of the genus Bifidobacterium and Lactobacillus make an essential component of human GM. Previously we have shown that in certain Bifidobacterium and Lactobacillus species the RelBE and MazEF superfamily of toxin-antitoxin (TA) systems may be used as functional biomarkers to differentiate these groups of bacteria at the species and strain levels. We have composed a database of TA genes of these superfamily specific for all lactobacilli and bifidobacteria species with complete genome sequence and confirmed that in all Lactobacillus and Bifidobacterium species TA gene composition is species and strain specific. To analyze composition of species and strains of two bacteria genera, Bifidobacterium and Lactobacillus, in human GM we developed TAGMA (toxin antitoxin genes for metagenomes analyses) software based on polymorphism in TA genes. TAGMA was tested on gut metagenomic samples. The results of our analysis have shown that TAGMA can be used to characterize species and strains of Lactobacillus and Bifidobacterium in metagenomes

    De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, <it>Fagopyrum esculentum </it>and <it>F. tataricum</it>, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. <it>F. esculentum </it>(common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations <it>Fagopyrum </it>species have not been the subject of large-scale sequencing projects.</p> <p>Results</p> <p>Normalized cDNA corresponding to genes expressed in flowers and inflorescences of <it>F. esculentum </it>and <it>F. tataricum </it>was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for <it>F. esculentum</it>) and 229 (<it>F. tataricum</it>) thousands of reads with average length of 341-349 nucleotides. <it>De novo </it>assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences.</p> <p>Conclusions</p> <p>454 transcriptome sequencing and <it>de novo </it>assembly was performed for two congeneric flowering plant species, <it>F. esculentum </it>and <it>F. tataricum</it>. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.</p

    Functional annotation of human long noncoding RNAs via molecular phenotyping

    Get PDF
    Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-todate lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.Peer reviewe

    An integrated expression atlas of miRNAs and their promoters in human and mouse

    Get PDF
    MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions
    • 

    corecore